183 research outputs found

    Accurate glottal model parametrization by integrating audio and high-speed endoscopic video data

    Get PDF
    The aim of this paper is to evaluate the effectiveness of using video data for voice source parametrization in the representation of voice production through physical modeling. Laryngeal imaging techniques can be effectively used to obtain vocal fold video sequences and to derive time patterns of relevant glottal cues, such as folds edge position or glottal area. In many physically based numerical models of the vocal folds, these parameters are estimated from the inverse filtered glottal flow waveform, obtained from audio recordings of the sound pressure at lips. However, this model inversion process is often problematic and affected by accuracy and robustness issues. It is here discussed how video analysis of the fold vibration might be effectively coupled to the parametric estimation algorithms based on voice recordings, to improve accuracy and robustness of model inversio

    A Multimodal Learning System for Individuals with Sensorial, Neuropsychological, and Relational Impairments

    Get PDF
    This paper presents a system for an interactive multimodal environment able (i) to train the listening comprehension in various populations of pupils, both Italian and immigrants, having different disabilities and (ii) to assess speech production and discrimination. The proposed system is the result of a research project focused on pupils with sensorial, neuropsychological, and relational impairments. The project involves innovative technological systems that the users (speech terabits psychologists and preprimary and primary schools teachers) could adopt for training and assessment of language and speech. Because the system is used in a real scenario (the Italian schools are often affected by poor funding for education and teachers without informatics skills), the guidelines adopted are low-cost technology; usability; customizable system; robustness

    Fitting a biomechanical model of the folds to high-speed video data through bayesian estimation

    Get PDF
    High-speed video recording of the vocal folds during sustained phonation has become a widespread diagnostic tool, and the development of imaging techniques able to perform automated tracking and analysis of relevant glottal cues, such as folds edge position or glottal area, is an active research field. In this paper, a vocal folds vibration analysis method based on the processing of visual data through a biomechanical model of the layngeal dynamics is proposed. The procedure relies on a Bayesian non-stationary estimation of the biomechanical model parameters and state, to fit the folds edge position extracted from the high-speed video endoscopic data. This finely tuned dynamical model is then used as a state transition model in a Bayesian setting, and it allows to obtain a physiologically motivated estimation of upper and lower vocal folds edge position. Based on model prediction, an hypothesis on the lower fold position can be made even in complete fold occlusion conditions occurring during the end of the closed phase and the beginning of the open phase of the glottal cycle. To demonstrate the suitability of the procedure, the method is assessed on a set of audiovisual recordings featuring high-speed video endoscopic data from healthy subjects producing sustained voiced phonation with different laryngeal settings

    Incoherent Frequency Fusion for Broadband Steered Response Power Algorithms in Noisy Environments

    Get PDF
    The steered response power (SRP) algorithms have been shown to be among the most effective and robust ones in noisy environments for direction of arrival (DOA) estimation. In broadband signal applications, the SRP methods typically perform their computations in the frequency-domain by applying a fast Fourier transform (FFT) on a signal portion, calculating the response power on each frequency bin, and subsequently fusing these estimates to obtain the final result. We introduce a frequency response incoherent fusion method based on a normalized arithmetic mean (NAM). Experiments are presented that rely on the SRP algorithms for the localization of motor vehicles in a noisy outdoor environment, focusing our discussion on performance differences with respect to different signal-to-noise ratios (SNR), and on spatial resolution issues for closely spaced sources. We demonstrate that the proposed fusion method provides higher resolution for the delay-and-sum SRP, and improved performances for minimum variance distortionless response (MVDR) and multiple signal classification (MUSIC

    Sensitivity-based region selection in the steered response power algorithm

    Get PDF
    The steered response power (SRP) algorithm is a well-studied method for acoustic source localization using a microphone array. Recently, different improvements based on the accumulation of all time difference of arrival (TDOA) information have been proposed in order to achieve spatial resolution scalability of the grid search map and reduce the computational cost. However, the TDOA information distribution is not uniform with respect to the search grid, as it depends on the geometry of the array, the sampling frequency, and the spatial resolution. In this paper, we propose a sensitivity-based region selection SRP (R-SRP) algorithm that exploits the nonuniform TDOA information accumulation on the search grid. First, high and low sensitivity regions of the search space are identified using an array sensitivity estimation procedure; then, through the formulation of a peak-to-peak ratio (PPR) measuring the peak energy distribution in the two regions, the source is classified to belong to a high or to a low sensitivity region, and this information is used to design an ad hoc weighting function of the acoustic power map on which the grid search is performed. Simulated and real experiments show that the proposed method improves the localization performance in comparison to the state-of-the-art
    • …
    corecore